This assignment is for ETC5521 Assignment 1 by Team wallaby comprising of Helen Evangelina and Rahul Bharadwaj.
Music, in a broad sense, is any art composed of sound, but it can express people’s thoughts and thoughts, which implies the author’s life experience, thoughts and feelings, and can bring people the enjoyment of beauty and the expression of human feelings. At the same time, music is also a form of social behavior, through which people can exchange feelings and life experiences.
In ancient times, when the court held a banquet, or some talented people visited the landscape, they would play music to boost the fun. But in modern times, because the threshold of classical music is too high, and its development has gradually reached the extreme, it has become a very small group, while pop music (the general name of popular songs, including Rock, R&B, Latin, etc) is gradually showing its own characteristics. Therefore, modern songs are quietly occupying the top position in people’s hearts because of their outstanding performance in conveying emotion and life experience. Listening to pop music has also become the most common behavior in everyone’s daily entertainment.
Spotify is a legitimate streaming music service platform, which has been supported by Warner Music, Sony, EMI and other major record companies around the world. Now it has more than 60 million users, and it is the world’s leading large-scale online streaming music playing platform.
Because Spotify contains a large number of users’ data, four users who are very interested in it, Charlie Thompson, Josiah parry, Donal Phipps, and Tom Wolff decided to make it easier for everyone to know their own preferences or the mainstream of most people’s listening to songs through spotify’s API, thus creating Spotifyr package. In addition to Spotify package, our data is also mixed with blog post data created by Kaylin Pavlik. Six main categories (EDM, Latin, pop, R&B, rap, rock) are used to classify 5000 songs. The combination of the two data has a great effect on the study of the popularity of pop music.
Nowadays, music plays an important role in people’s life. It plays an indispensable role in helping people manage and improve their quality of life. As fans of music, we not only enjoy music, but also wonder how music strikes people’s hearts with simple tones, rhythms, timbres and words. How popular is each genre? How much influence does the genre, or the various attributes of songs, have on music popularity? Does it makes us dance or sing unconsciously, or does it convey our emotions and implicate our thoughts? The curiosity behind all these questions drives the purpose of this analysis.
By doing this exploratory data analysis, we want to know:
Primary Question: What audio features are capable of making an impact on the popularity of music artworks and contribute to the emergence of Top Songs?
Sub Questions:
Since 1957, what are the audio features of those top artists who make the most music artworks?
Explore our favorite artist - Coldplay’s works, e.g. how about the musical positiveness conveyed by their albums?
There are plenty of modern music genres nowadays, What unique style or charm can stand out and become the first choice of people?
Questions Added to enhance the scope of the analysis:
This helps us enhance the scope of the primary analysis and broadens our understanding of the relations between popularity and audio features.
Data Collection Methods:
Spotifyr package can extract track audio characteristics or other related information from Spotify’s Web API in batches. For example, if you want to search for an artist, just type in his name, and all his albums or songs will be listed in seconds.
Meanwhile, Spotifyr package will record the popularity metrics of all tracks or albums, so it is easy to understand the correlation between music popularity and music characteristics. Then, Jon Harmon and Neal Grantham extracted the Spotifr package and added the content of Kaylin Pavlik’s recent blogpost to divide the genre of nearly 5000 songs, thus generating the Tidytuesdayr package we need for this assignment.
We chose music works created by artists that can be found on Spotify from January 1, 1957 to January 29, 2020.
## Rows: 32,833
## Columns: 24
## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
## $ track_id <chr> "6f807x0ima9a1j3VPbc7VN", "0r7CVbZTWZgbTCY...
## $ track_name <chr> "I Don't Care (with Justin Bieber) - Loud ...
## $ track_artist <chr> "Ed Sheeran", "Maroon 5", "Zara Larsson", ...
## $ track_popularity <int> 66, 67, 70, 60, 69, 67, 62, 69, 68, 67, 58...
## $ track_album_id <chr> "2oCs0DGTsRO98Gh5ZSl2Cx", "63rPSO264uRjW1X...
## $ track_album_name <chr> "I Don't Care (with Justin Bieber) [Loud L...
## $ track_album_release_date <chr> "2019-06-14", "2019-12-13", "2019-07-05", ...
## $ playlist_name <chr> "Pop Remix", "Pop Remix", "Pop Remix", "Po...
## $ playlist_id <chr> "37i9dQZF1DXcZDD7cfEKhW", "37i9dQZF1DXcZDD...
## $ playlist_genre <chr> "pop", "pop", "pop", "pop", "pop", "pop", ...
## $ playlist_subgenre <chr> "dance pop", "dance pop", "dance pop", "da...
## $ danceability <dbl> 0.748, 0.726, 0.675, 0.718, 0.650, 0.675, ...
## $ energy <dbl> 0.916, 0.815, 0.931, 0.930, 0.833, 0.919, ...
## $ key <int> 6, 11, 1, 7, 1, 8, 5, 4, 8, 2, 6, 8, 1, 5,...
## $ loudness <dbl> -2.634, -4.969, -3.432, -3.778, -4.672, -5...
## $ mode <int> 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, ...
## $ speechiness <dbl> 0.0583, 0.0373, 0.0742, 0.1020, 0.0359, 0....
## $ acousticness <dbl> 0.10200, 0.07240, 0.07940, 0.02870, 0.0803...
## $ instrumentalness <dbl> 0.00e+00, 4.21e-03, 2.33e-05, 9.43e-06, 0....
## $ liveness <dbl> 0.0653, 0.3570, 0.1100, 0.2040, 0.0833, 0....
## $ valence <dbl> 0.518, 0.693, 0.613, 0.277, 0.725, 0.585, ...
## $ tempo <dbl> 122.036, 99.972, 124.008, 121.956, 123.976...
## $ duration_ms <dbl> 194754, 162600, 176616, 169093, 189052, 16...
Data Table -
A Visual Overview of the Data:
Visual Representation of the Dataset
A picture speaks a thousand words. Thus, we represent the data in a simple and elegant visualization that describes the same column names and types described previously through text.
Since our analysis focuses on correlations between audio features, it is a good idea to have some overview as to how the numerical fields correlate.
A Visual Representation of the Correlation of numeric data
Clean Data with necessary columns
Top 20 Artists who wrote the most songs from 1941 to 2020
Similarly the figure above shows the same in a bar plot. This will help to deepen our impression of the top 20 singers and have an intuitive understanding of the gap between them. Like mentioned previously, pictures speak a lot more than tables and information in text format.
We filter artists whose popularity is greater than 95, and then visualize it in the form of a radar plot. This way, the singers who are at the top can be clearly identified at a glance. At the same time, music lovers can know the characteristics of these top singers’ music artworks.
Characteristics of Top Singers
The height of each pie segment shows the level of popularity. The color intensity shows the energy levels of the songs by that artist and different colors represent genre. The blue outline describes the danceability of the songs. This way, we can perceive three audio features at the same time along with Track Popularity.
From the figure above, we can see that Maroon 5, the Weekend, Roddy Rich and KAROL G are overwhelming in popularity. Also, it is clear that popular singers usually create many genres of songs, which are not limited to a single genre.
Next, from the perspective of different artists’ music artworks style, they are filled with the great differences. For example, from the brightness of colors, we can see that the Energy brought by Maroon 5 and Billie Eilish’s music artworks is not too high. This is not to elaborate their shortcomings, but to elaborate their style, which is lyrical and soft. If judging from the color of each fan-shaped boundary line, it can be concluded that Roddy Rich and Trevor Daniel’s works have the highest value of danceability, after the comparison of each artworks’ average tempo, rhythm stability, beat strength, and overall regularity.
In this part, we want to take one artist for example to do some detailed exploratory analysis using the “spotifyr” package. Here we choose the Coldplay, our favorite artist.
First, we loaded all the albums of Coldplay available on spotify and dropped the duplicate ones (some live tour albums are duplicate with the existed ones). We calculated the average valence of each album. The results are shown in the following table.
| album_name | valence |
|---|---|
| Everyday Life | 0.30 |
| Viva La Vida or Death and All His Friends | 0.26 |
| Mylo Xyloto | 0.25 |
| Parachutes | 0.23 |
| A Head Full of Dreams | 0.23 |
| X&Y | 0.22 |
| Ghost Stories | 0.21 |
| Love in Tokyo | 0.19 |
| A Rush of Blood to the Head | 0.18 |
According to the spotify tracks documentation, The valence variable is measured from 0.0 to 1.0, describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). The highest valence of these albums is 0.3, and the lowest valence is 0.18, which means the songs of Coldplay usually sounds more negative than positive for the audience.
Second, we make a density plot to show the ranges and densities of valence of each album.
Valence Density of Coldplay Albums
| word | sentiment | n |
|---|---|---|
| love | positive | 7 |
| easy | positive | 4 |
| fall | negative | 4 |
| grace | positive | 4 |
| miss | negative | 4 |
acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic.
danceablity: Danceability describes how suitable a track is for dancing. A value of 0.0 is least danceable and 1.0 is most danceable.
duration_ms: The duration of the track in milliseconds. (And duration_s in seconds, rounded.)
energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
instrumentalness: Predicts whether a track contains no vocals.
key: The key the track is in.
liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.
loudness: The overall loudness of a track in decibels (dB).
mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
speechiness: Speechiness detects the presence of spoken words in a track.
tempo: The overall estimated tempo of a track in beats per minute (BPM).
valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
Audio Feature Density Plot
The next three box plots are to find out the differences of music attributes between different Music Genres.
Firstly, the relationship between color and Music Genre is established, and put into the same tibble, call “COLORS”. This method allows different Music Genre to be clearly distinguished by different colors, and then the specific characteristic of each Music Genre can be judged from those box plots.
Average valence by Music Genre
Average Energy by Music Genre
The second plot above describes the relationship between Music Genre and Energy. Energy is a measure from 0.0 to 1.0 and represents a conceptual measure of intensity and activity. It can be clearly seen from the plot that EDM has the highest value of Energy, while Rythm and Bass value of Energy is the lowest, which also shows the style of these two Music Genres. Mostly, EDM will make people feel energized, loud, and noisy when listening. However, R&B is mainly lyrical, slow and quiet, which bring less energy for the listeners. Similarly, Rock has always been famous for its flexible and bold expression and passionate music rhythm and its ranking is only inferior to EDM.
Finally, the above plot describes the relationship between Music Genres and Speechiness. Speechiness detects the presence of spoken words in a track. If more words or sentences are said in a song, the closer to 1.0 the attribute value. That attribute is very interesting, which indicates whether the artists tends to express ideas by describing the lyrics in music or writing the melody of music to express their feelings.
Average Speechiness by Music Genre
From the plot, there is no doubt that Rap is bound to occupy the first place, because the characteristic of Rap is to quickly tell a series of rhyming lyrics against the background of mechanical rhythmic sound. What is worth noting is that Rock and POP are the lowest, which shows that those two genres tend to use the melody or rhythm of music to affect the audience, rather than using the lyrics.
After describing the contents and internal relations of the three plots in detail, there are still many related attributes that have not been explored. The purpose of our group is to put up the most interesting parts together. If someone is interested, it is easy to continue and build upon the existing analyses.
| playlist_genre | n |
|---|---|
| edm | 6043 |
| rap | 5746 |
| pop | 5507 |
| r&b | 5431 |
| latin | 5155 |
| rock | 4951 |
The following figure shows the average popularity of songs released in different time. To show the result clearly and for convenience of comparison, we divided the result for each genre.
Genre Popularity by Decade
EDM music emerged in the 1970s, and its popularity is 40 or even less. This shows EDM music is not the mainstream music nowadays and is restricted to a smaller group.
Latin and pop music have been popular since the 1960s. The 1970s was the golden time for latin songs, while the 1960s and 1970s were the golden time for pop music. These old songs are popular even today!
R&B music went through ups and downs. The songs released from the 1980s to the 2000s are less popular than others.
Rap music has been popular since the 1960s, and the oldest rap music is still the most popular ones. The songs released in the 2000s have the lowest popularity now.
The popularity of rock music released in different time period are quite stable. While the ones released from the 1960s to the 1990s are more popular than the others.
The correlation of song features is very helpful for us to explore the reasons for the popularity of music artworks. We can see from the correlation plot that the characteristics of each song are specific and unique, but we can summarize them with ten musical attributes. Meanwhile, there are three types of relation between different attributes: Negative correlation, positive correlation or completely irrelevant. This is very important for us to analyze the properties of music artworks in the future.
For example, if a song has a strong energy attribute, it must also have a high value of loudness, and the probability of not belonging to acoustic is also very high. If a person likes songs that are more active or have higher valence, he should explore some potential favorite songs of high danceability, high energy, and contains more vocal content. It is easy to see that the role of correlation plot is very meaningful. It can play an irreplaceable role in the analysis of songs or the selection of the favorite attributes of songs and the rest of effects can be explored later.
Correlation between Audio Features
After describing the unique information about audio features, now we pay attention to exploring whether these audio features contribute to a higher popularity. First we plot each audio feature of the songs and the popularity in the following figure.
Popularity vs Audio Feature
It shows that liveness has a negative relationship with popularity and we also find that there’s no absolute relationship between valence and popularity. A higher valence doesn’t necessarily make a song more popular.This is consistent with our sentiment analysis.
Also, We are not sure whether those above dot plots can directly reveal the relationship between these popularity and audio features. So we pay attention to exploring whether these audio features contribute to a higher popularity using a linear regression model just in case.
Here we filtered the songs with a popularity greater than 0, since 0 popularity value does not make sense in this model. And the following table shows all the audio features with a p-value less than 0.05. We can draw a conclusion that danceability and valence contribute most to a higher popularity.
Acousticness, key, loudness, mode and tempo also have positive relationship with popularity. While energy, instrumentalness, liveness and speechiness have negative relationship with popularity, with is similar with those dot plots conclusion.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 744.18 | 15.80 | 47.10 | 0.00 |
| acousticness | 18.36 | 6.85 | 2.68 | 0.01 |
| danceability | 35.67 | 9.91 | 3.60 | 0.00 |
| duration_ms | 0.00 | 0.00 | -14.03 | 0.00 |
| energy | -253.44 | 11.32 | -22.39 | 0.00 |
| instrumentalness | -109.25 | 6.07 | -18.01 | 0.00 |
| key | 0.95 | 0.35 | 2.69 | 0.01 |
| liveness | -23.39 | 8.47 | -2.76 | 0.01 |
| loudness | 14.09 | 0.61 | 23.07 | 0.00 |
| mode | 6.61 | 2.58 | 2.57 | 0.01 |
| speechiness | -43.00 | 12.83 | -3.35 | 0.00 |
| tempo | 0.18 | 0.05 | 3.72 | 0.00 |
| valence | 37.22 | 6.08 | 6.12 | 0.00 |
Popularity vs Audio Feature using Smooth Curves
We can observe that most of the audio features have almost no relation with popularity except for Energy and Instrumentalness which negatively affect popularity while Danceability positively affect popularity. This trend is observed for tracks that have a popularity greater than 50.
This leads us to extend the analysis to pursue danceabilty and check which music genre, and artists are in line with this trend. The next analysis pursues our questiton as to what the unique selling point for each selected artist is.
Now that we have analyzed about the correlation of different audio features, let’s explore how the artists are popular and exactly why they are popular. This involves analyzing common audio features in the songs of the top artists.
We choose the following artists who are regarded as one of the best in their genre:
We select Danceability, Speechiness , Energy and Valence as our audio features since these best describe the genres we have selected.
Taylor Swift Audio Features
Eminem Audio Features
AC/DC Audio Features
Shakira Audio Features
Usher Audio Features
David Guetta Audio Features
Danceabilty -
Speechiness -
Energy -
Valence -
Firstly, there is a positive or negative correlation between audio features and track popularity. However, as we all know, the value of a art work can’t be measured only by numbers. The popularity of music artworks depends more on the artist’s own popularity, creative talent or singing ability, or external factors such as world trends. The probability of success by deliberately catering to audio features and creating specific songs is not sufficient.
Secondly, each top artist has its own artistic characteristics, and will be loved by specific groups of people. Top artists do not create music artworks according to the trend, instead, they will create their own trend for the world.
As for the six kinds of music genres that can stand out from the modern music, there are also their own characteristics inside. It’s hard to understand the reasons for their success because of their unique styles. What we can do is to determine the genre of each song according to its style.
Although Coldplay is one of the representative rock artist, their works contain more negative emotions. This is also in line with the rebellious and critical spirit of rock music, and this spirit has been respected by young people of different races all the time. They stick to their own style, try unconventional music routines as far as possible, and point to people’s hearts with straightforward, profound and moving melody. This also confirms our analysis that Coldplay songs’ lyrics convey negative emotions, which does not affect their popularity, but makes them top artists.
In conclusion, track popularity will pay more attention to the singer’s own ability and attitude, rather than audio features. The biggest role of audio features is to reflect the singer’s music style, rather than increase popularity.
Arnold, Jeffrey B. 2019. Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’. https://CRAN.R-project.org/package=ggthemes.
Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for “Grid” Graphics. https://CRAN.R-project.org/package=gridExtra.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
Hvitfeldt, Emil. 2020. Textdata: Download and Load Various Text Datasets. https://CRAN.R-project.org/package=textdata.
Parry, Josiah, and Nathan Barr. 2020. Genius: Easily Access Song Lyrics from Genius.com. https://CRAN.R-project.org/package=genius.
Robinson, David, Alex Hayes, and Simon Couch. 2020. Broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.
Silge, Julia, and David Robinson. 2016. “Tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS 1 (3). https://doi.org/10.21105/joss.00037.
Thompson, Charlie. 2017. “Spotifyr: R Wrapper for the’Spotify’Web Api.” https://github.com/charlie86/spotifyr.
Tierney, Nicholas. 2017. “Visdat: Visualising Whole Data Frames.” JOSS 2 (16): 355. https://doi.org/10.21105/joss.00355.
Waring, Elin, Michael Quinn, Amelia McNamara, Eduardo Arino de la Rubia, Hao Zhu, and Shannon Ellis. 2020. Skimr: Compact and Flexible Summaries of Data. https://CRAN.R-project.org/package=skimr.
Wei, Taiyun, and Viliam Simko. 2017. R Package “Corrplot”: Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Winston Chang. 2020. Devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools.
Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wilke, Claus O. 2020. Ggridges: Ridgeline Plots in ’Ggplot2’. https://CRAN.R-project.org/package=ggridges.
Xie, Yihui. 2020. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Xie, Yihui, Joe Cheng, and Xianying Tan. 2020. DT: A Wrapper of the Javascript Library ’Datatables’. https://CRAN.R-project.org/package=DT.
Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.